Why Git?

Two reasons that we’ll be most interested in (to start):

  • It’s a great way to safely store organized versions of a project

  • It’s ideal for data science collaborators who need to share & edit code together

Watch this video before or after today’s session

Some notes:

Repositories (repos) on GitHub are the same unit as an RStudio Project - it’s a place where you can easily store all information/data/etc. related to whatever project you’re working on.

When we create a Repository in GitHub and have it communicating with a Project in RStudio, then we can get (pull) information from GitHub to RStudio, or push information from RStudio to GitHub where it is safely stored and/or our collaborators can access it. It also keeps a complete history of updated versions that can be accessed/reviewed by you and your collaborators at any time, from anywhere, as long as you have the internet.

A couple of general tips:

  • Pull frequently (if working with anyone else, and when you start working on a project again, even if on the same device)

  • Commit/push in small, meaningful increments and with useful (searchable, descriptive) Commit messages

  • The best way to deal with merge conflicts is to avoid creating merge conflicts. This is what happens when two people pull a version, work on it separately and then try to push back to the repo at the same time Communicate with collaborators so you’re not all working on the same piece of code at the same time.

Exercises

PART 1. Fork & Clone an existing repo

PART 2. Create your own repo and version controlled R project from scratch

PART 3. Use the GitHub Classroom to fork a project, organise folders, run a script

END Make sure you complete this worksheet it will be vital for submitting your second summative assignment.

Learning outcomes

  • Become a master of reproducible research & version control with GitHub

  • Improve project structure

  • Write informative notes for collaborators & archiving

PART 1. Fork & clone an existing repo on GitHub, make edits, push back

  1. Go to github.com and log in (you need your own account - for sign up with your uea.ac.uk e-mail)

  2. In the Search bar, look for repo Philip-Leftwich/5023Y-Happy-Git

  3. Click on the repo name, and look at the existing repo structure

  4. FORK the repo

  5. Press Clone/download, copy the URL, and create a new project from Git repository in RStudio (add your URL) (Note you may be asked to enable Git and/or asked to provide your GitHub username and password)

 

 

  1. Open the some_cool_animals.Rmd document, and the accompanying html

  2. Add your name to the top of the document

  3. BUT WAIT. We have forgotten to add a great image and facts about a very important species - Baby Yoda, including an image

FACTS

  • Also known as “The Child”

  • likes unfertilised frog eggs & control knobs

  • strong with the force

 

 

  1. Once you’ve added Grogu, knit the Rmd document to update the html
  1. Stage, Commit & Push all files

Staged - pick those files which you intend to bind to a commit

Commit - write a short descriptive message, binds changes to a single commit

Push - “Pushes” your changes from the local repo to the remote repo on GitHub.

 

  1. On GitHub, refresh and see that files are updated. Cool! Now you’ve used something someone else has created, customized it, and saved your updated version.

PART 2. Create your own repo & version controlled R Project from scratch

“But I forgot how to code over Christmas!!!”

Today we will have a bit of a play with tidyverse tools and plotting to refresh your memory!

Remember if your code doesn’t run, it is usually a simple fix. Read the error message carefully and see how many of these bingo tiles you can pick up!

  1. Go back to your GitHub account

  2. Click on the plus sign (upper right, by your profile pic/icon) to create a new repository

  3. Name the repo 5023Y-second-repo-yourinitials (like 5023Y-second-repo-PL), and select to initialize with a ReadMe

  4. Edit the ReadMe (however you want - notice that markdown formatting works & you can see a preview) & commit

Some tips:

Include a title, introduction/objectives

  1. Clone to create a connected R Project in RStudio

  2. Create a new R Markdown document

  3. Attach the {tidyverse},{palmerpenguins} and {plotly} packages in a hidden code chunk (include = FALSE)

  4. Create an interactive plot of PalmerPenguins with {plotly}, showing the output but not the code or messages

penguin_graph <- ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
  geom_point(aes(size = body_mass_g, 
                 color = species),
             alpha = 0.4) +
  scale_color_manual(values = c("purple","orange","black")) +
  theme_minimal() +
  labs(x = "bill length (mm)",
       y = "bill depth mm",
       title = "Penguin measurements")

ggplotly(penguin_graph, tooltip = c("species"))
  1. Knit & save your .Rmd
  1. Stage, commit & push back to GitHub

PART 3. GitHub Classroom enabled R Projects with subfolders

BLURB ON CLASSROOM

  1. Follow this invite link

  2. You will be invited to sign-in to Github (if not already) & to join the UEABIO organisation

  3. Clone your assignment to work locally in RStudio

  4. In your local project folder, create subfolders ‘data’ and ‘final_graphs’ (Note use the dir.create commands in the console)

  5. Drop the file disease_burden.csv into the ‘data’ subfolder (Note unlike in Bash there is no base R command to move files so do this using the RStudio pane)

  6. Open a new .R script

  7. Attach the {tidyverse} and {janitor} package

  8. Read in the disease_burden.csv data

This is a file look at the death rate for every country in the world across four decades.

  1. Stage, commit & push at this point - notice that the empty folder ‘final_graphs’ doesn’t show up (won’t commit an empty folder)
  1. Back in the script, write a short script to read and clean the data.

Use %>% and filter to pull out “United States”, “Japan”, “Afghanistan”& “Somalia”

We want to look at death for both sexes at 0-6 days so use filter here as well.

Assign this to a new object

library(tidyverse)
library(janitor)

db <- read_csv("data/disease_burden.csv") %>%
  clean_names() %>%
  rename(deaths_per_100k = death_rate_per_100_000)

# View(db)

# Subset (US, Japan = lowest infant death rates, Afghanistan = highest infant death rates)

db_sub <- db %>%
  filter(country_name %in% c("United States", "Japan", "Afghanistan", "Somalia")) %>%
  filter(age_group == "0-6 days", sex == "Both")
  1. Make a ggplot plotting the deaths per 100K by country across the four decades

HINT - use geom_line() and remember to separate countries by colour

ggplot(data = db_sub) +
  geom_line(aes(x = year,
                 y = deaths_per_100k,
                 color = country_name)) +
  scale_color_manual(values = c("black", "blue", "magenta", "orange"))
  1. Update your graph with direct labels (using annotate) and vertical or horizontal lines with geom_vline or geom_hline
# Graph
# New things: annotation + vertical line

ggplot(data = db_sub) +
  geom_line(aes(x = year,
                 y = deaths_per_100k,
                 color = country_name)) +
  scale_color_manual(values = c("black", "blue", "magenta", "orange")) +
  annotate(geom = "text",
           x = 1985,
           y = 2.2e5,
           label = "Afghanistan",
           size = 2.5) +
  geom_vline(xintercept = 2000,
             lty = 2) +
  theme_minimal()

  1. Use ggsave() to write your graph to a .png in the ‘final_graphs’ subfolder
ggsave(("final_graphs/disease_graph.png"), width = 5, height = 3)
  1. Save, stage, commit & push

  2. Check that changes are stored on GitHub

(NOTE this will be in your organisations rather than repos)

END

Make sure you finish all three exercises before next week to become a GitHub pro!!!!!!!!!

** Want to learn about all things GitHub and R? (https://happygitwithr.com/)